Learning nonlinear operators in latent spaces for real-time predictions of complex dynamics in physical systems

Predicting complex dynamics in physical applications governed by partial differential equations in real-time is nearly impossible with traditional numerical simulations due to high computational cost. Neural operators offer a solution by approximating mappings between infinite-dimensional Banach spaces, yet their performance degrades with system size and complexity. We propose an approach for learning neural operators in latent spaces, facilitating real-time predictions for highly nonlinear and multiscale systems on high-dimensional domains. Our method utilizes the deep operator network architecture on a low-dimensional latent space to efficiently approximate underlying operators. Demonstrations on material fracture, fluid flow prediction, and climate modeling highlight superior prediction accuracy and computational efficiency compared to existing methods. Notably, our approach enables approximating large-scale atmospheric flows with millions of degrees, enhancing weather and climate forecasts. Here we show that the proposed approach enables real-time predictions that can facilitate decision-making for a wide range of applications in science and engineering.

Predicting complex dynamics in physical applications governed by partial differential equations in real-time is nearly impossible with traditional numerical simulations due to high computational cost.Neural operators offer a solution by approximating mappings between infinite-dimensional Banach spaces, yet their performance degrades with system size and complexity.We propose an approach for learning neural operators in latent spaces, facilitating real-time predictions for highly nonlinear and multiscale systems on highdimensional domains.Our method utilizes the deep operator network architecture on a low-dimensional latent space to efficiently approximate underlying operators.Demonstrations on material fracture, fluid flow prediction, and climate modeling highlight superior prediction accuracy and computational efficiency compared to existing methods.Notably, our approach enables approximating large-scale atmospheric flows with millions of degrees, enhancing weather and climate forecasts.Here we show that the proposed approach enables real-time predictions that can facilitate decision-making for a wide range of applications in science and engineering.
Achieving universal function approximation is one of the most important tasks in the rapidly growing field of machine learning (ML).To this end, deep neural networks (DNNs) have been actively developed, enhanced, and used for a plethora of versatile applications in science and engineering including image processing, natural language processing (NLP), recommendation systems, and design optimization [1][2][3][4][5][6] .In the emerging field of scientific machine learning (SciML), DNNs are a ubiquitous tool for analyzing, solving, and optimizing complex physical systems modeled with partial differential equations (PDEs) across a range of scenarios, including different initial and boundary conditions (ICs, BCs), model parameters and geometric domains.Such models are trained from a finite dataset of labeled observations generated from a (generally expensive) traditional numerical solver (e.g., finite difference method (FD), finite elements (FEM), computational fluid dynamics (CFD), and once trained they allow for accurate predictions with real-time inference [7][8][9][10] ).
DNNs are conventionally used to learn functions by approximating mappings between finite dimensional vector spaces.Operator regression, a more recently proposed ML paradigm, focuses on learning operators by approximating mappings between abstract infinite-dimensional Banach spaces.Neural operators specifically, first introduced in 2019 with the deep operator network (DeepONet) 11 , employ DNNs to learn PDE operators and construct a surrogate model, which allows for fast inference and high generalization accuracy.Motivated by the universal approximation theorem for operators proposed by Chen & Chen 12 , DeepONet encapsulates and extends the theorem for deep neural networks 11 .The architecture of DeepONet features a DNN, which encodes the input functions at fixed sensor points (branch net), and another DNN, which encodes the information related to the spatio-temporal coordinates of the output function (trunk net).Since its first appearance, standard DeepONet has been employed to tackle challenging problems involving complex high-dimensional dynamical systems [13][14][15][16][17] .In addition, extensions of DeepONet have been recently proposed in the context of multi-fidelity learning [18][19][20] , integration of multiple-input continuous operators 21,22 , hybrid transferable numerical solvers 23 , transfer learning 24 , and physics-informed learning to satisfy the underlying PDE 25,26 .
Another class of neural operators is the integral operators, first instantiated with the graph kernel networks (GKN) introduced by 27 .In GKNs, the solution operator is expressed as an integral operator of Green's function which is modeled with a neural net and consists of a lifting layer, iterative kernel integration layers, and a projection layer.GKNs were found to be unstable for multiple layers and a new graph neural operator was developed in 28 based on a discrete non-local diffusion-reaction equation.Furthermore, to alleviate the inefficiency and cost of evaluating integral operators, the Fourier neural operator (FNO) 29 was proposed, in which the integral kernel is parameterized directly in the Fourier space.The input to the network, like in GKNs, is elevated to a higher dimension and then passed through numerous Fourier layers before being projected back to the original dimension.Each Fourier layer involves a forward fast Fourier transform (FFT), followed by a linear transformation of the low-Fourier modes and then an inverse FFT.Finally, the output is added to a weight matrix, and the sum is passed through an activation function to introduce nonlinearity.Different variants of FNO have been proposed, such as the FNO-2D which performs 2D Fourier convolutions and uses a recurrent structure to propagate the PDE solution in time, and the FNO-3D, which performs 3D Fourier convolutions through space and time.Compared to DeepONet, FNO, in its seminal paper 29 , employs evaluations restricted to an equispaced mesh to discretize both the input and output spaces, where the mesh and the domain must be the same.The interested reader is referred to 30 for a comprehensive comparison between DeepONet and FNO across a range of complex applications.Recent advancements in neural operator research have yielded promising results for addressing the bottleneck of FNO.Two such integral operators are the Wavelet Neural Operator (WNO) 31 and the Laplace Neural Operator (LNO) 32 , which have been proposed as alternative solutions for capturing the spatial behavior of a signal and accurately approximating transient responses, respectively.
Despite the impressive capabilities of the aforementioned methods to learn surrogates for complex PDEs, these models are primarily used in a data-driven manner, and thus a representative and sufficient labeled dataset needs to be acquired a-priori.Often, complex physical systems require high-fidelity simulations defined on fine spatial and temporal grids, which results in very high-dimensional datasets.Furthermore, the high (and often prohibitive) expense of traditional numerical simulators e.g., FEM allows for the generation of only a few hundred (and possibly even fewer) observations.The combination of few and very high-dimensional observations can result in sparse datasets that often do not represent adequately the input/output distribution space.In addition, raw high-dimensional physics-based data often consists of redundant features that can (often significantly) delay and hinder network optimization.Physical constraints cause the data to live on lower-dimensional latent spaces (manifolds) that can be identified with suitable linear or nonlinear dimension reduction (DR) techniques.Previous studies have shown how latent representations can be leveraged to enable surrogate modeling and uncertainty quantification (UQ) by addressing the 'curse of dimensionality' in highdimensional PDEs with traditional approaches such as Gaussian processes (GPs) and polynomial chaos expansion (PCE) [33][34][35][36][37] .Although neural network-based models can naturally handle high-dimensional input and output datasets, it is not clear how their predictive accuracy, generalizability, and robustness to noise are affected when these models are trained with suitable latent representations of the highdimensional data.
In this work, we aim to investigate the aforementioned open questions by exploring the training of DeepONet on latent spaces for high-dimensional time-dependent PDEs of varying degrees of complexity.The idea of training neural operators on latent spaces using DeepONet and autoencoders (AE) was originally proposed in 16 .In this work, the growth of a two-phase microstructure for particle vapor deposition was modeled using the Cahn-Hilliard equation.In another recent work 38 , the authors explored neural operators in conjunction with AE to tackle high-dimensional stochastic problems.But the general questions of the predictive accuracy and generalizability of DeepONet trained on latent spaces remain and require systematic investigation with comparisons to conventional neural operators.
The training of neural operators on latent spaces consists of a twostep approach: first, training a suitable AE model to identify a latent representation for the high-dimensional PDE inputs and outputs, and second, training a DeepONet model and employing the pre-trained AE decoder to project samples back to the physically interpretable highdimensional space (see Fig. 1).Related methods, in particular, the U-Net framework within the U-shaped neural operator (U-NO) 39 have aimed to achieve a similar objective.However, it's important to note that, while the U-Net framework within U-NO is commonly recognized as having encoder and decoder segments, these segments do not act as independent encoder and decoder components.Therefore, unlike AEs (and many other unsupervised dimension reduction methods), the encoder and decoder components cannot be disentangled from the original, high-dimensional data.The benefit of the proposed L-DeepONet framework, on the other hand, is that it is designed with independent encoder and decoder components to allow the direct construction of a neural operator in an arbitrarily learned lowdimensional space.It is therefore not constrained by the architecture or design of the encoder/decoder, which may be an AE (as studied here) or a different unsupervised dimension reduction method altogether 37 .
The L-DeepONet framework has two advantages: first, the accuracy of DeepONet is improved, and second, the L-DeepONet training is accelerated due to the low dimensionality of the data in the latent space.Combined with the pre-trained AE model, L-DeepONet can perform accurate predictions with real-time inference and learn the solution operator of complex time-dependent PDEs in lowdimensional space.The contributions of this work can be summarized as follows: • We investigate the performance of L-DeepONet, an extension of standard DeepONet, for high-dimensional time-dependent PDEs that leverages latent representations of input and output functions identified by suitable autoencoders (see Fig. 1).• We perform direct comparisons with vanilla DeepONet for complex physical systems, including brittle fracture of materials, and complex convective and atmospheric flows, and demonstrate that L-DeepONet consistently outperforms the standard approach in terms of accuracy and computational time.• We perform direct comparisons with another neural operator model, the Fourier neural operator (FNO), and two of its variants, i.e., FNO-2D and FNO-3D, and identify advantages and limitations for a diverse set of applications.• We perform direct comparisons with U-NO and report the accuracy, computational time, and the number of trainable parameters in the Supplementary Tables S2 and S3.
For all the problems considered in this work, we have generated the training data with a fixed spatio-temporal discretization to deploy an AE for dimensionality reduction.However, for more general problems where training data vary in fidelity or where the training data is provided at arbitrary points in space and time, linear (e.g.principal component analysis, linear discriminant analysis 40 ) and non-linear projection (e.g.bicubic interpolation, t-SNE 41 , diffusion maps 42 ) methods can be employed to create a shared continuous basis onto which the training data can be projected.This dual functionality offers the option to directly reduce dimensionality to enable the training of the DeepONet directly from the projected data.Alternatively, it facilitates the interpolation of the given training data onto a fixed grid, aligning with the requirement of AE.However, the implementation of such methods is beyond the scope of this work.

Results
To demonstrate the advantages and efficiency of L-DeepONet, we learn the operator for three diverse PDE models of increasing complexity and dimensionality.First, we consider a PDE that describes the growth of fracture in brittle materials which are widely used in various industries including construction and manufacturing.Predicting with accuracy the growth of fractures in these materials is important for preventing failures, improving safety, reliability, and cost-effectiveness in a wide range of applications.Second, we consider a PDE describing convective fluid flow, a common phenomenon in many natural and industrial processes.Understanding how these flows evolve may allow engineers to better design systems such as heat exchangers or cooling systems to enhance efficiency and reduce energy consumption.Finally, we consider a PDE describing largescale atmospheric flows which can be used to predict patterns that occur in weather systems.Such flows play a crucial role in the Earth's climate system influencing precipitation, and temperature which in turn may have a significant impact on water resources, agricultural productivity, and energy production.Developing an accurate surrogate to predict with detail such complex atmospheric patterns may allow us to better adapt to changes in the climate system and develop effective strategies to mitigate the impacts of climate change.For all PDEs, the input functions for the operator represent initial conditions modeled as Gaussian or non-Gaussian random fields.We perform direct comparisons of L-DeepONet with the standard DeepONet model trained on the full dimensional data and with FNO.More details about the models and the corresponding data generation process are provided in the Supplementary Section on Data Generation to assist the readers in readily reproducing the results presented below.

Brittle fracture in a plate loaded in shear
Fracture is one of the most commonly encountered failure modes in engineering materials and structures.Defects, once initialized, can lead to catastrophic failure without warning.Therefore, from a safety point of view, prediction of the initiation and propagation of cracks is of utmost importance.In the phase field fracture modeling approach, the effects associated with crack formation, such as stress release, are incorporated into the constitutive model 43 .Modeling fracture using the phase field method involves the integration of two fields, namely the vector-valued elastic field, u(x), and the scalar-valued phase field, ϕ x ð Þ 2 ½0,1, with 0 representing the undamaged state of the material and 1 a fully damaged state.
The equilibrium equation for the elastic field for an isotropic model, considering the evolution of crack, can be written as 44 : where σ is the Cauchy stress tensor, f is the body force and g(ϕ) = (1−ϕ) 2 represents the monotonically decreasing stressdegradation function that reduces the stiffness of the bulk material in the fracture zone.The elastic field is constrained by Dirichlet and Neumann boundary conditions: where t N is the prescribed boundary forces and u is the prescribed displacement for each load step.The Dirichlet and Neumann boundaries are represented by ∂Ω D and ∂Ω N , respectively.Considering the second-order phase field for a quasi-static setup, the governing equation can be written as: where G c is a scalar parameter representing the critical energy release rate of the material, l 0 is the length scale parameter, which controls the In the first step, a multi-layer autoencoder is trained using a combined dataset of the high-dimensional input and output realizations of a PDE model, fx i ,y i g N i = 1 , respectively.The trained encoder projects the data onto a latent space R d and the dataset on the latent space, fx r i ,y r i g N i = 1 is then used to train a DeepONet model and learn the operator G θ , where θ denotes the trainable parameters of the network.Finally, to evaluate the performance of the model on the original PDE outputs and perform inference, the pre-trained decoder is employed to map predicted samples back to physically interpretable space.
diffusion of the crack, H(x, t) is a local strain-history functional, and y c , l c represent the position and length of the crack respectively.For sharp crack topology, l 0 → 0 45 .H(x, t) contains the maximum positive tensile energy (Ψ + 0 ) in the history of deformation of the system.The strainhistory functional is employed to initialize the crack on the domain as well as to impose irreversibility conditions on the crack growth 46 .In this problem, we consider y c , l c to be random variables with y c ~U[0.3, 0.7] and l c ~U[0.4,0.6], thus, the initial strain function H(x, t = 0; l c , y c ) is also random (see the Supplementary Section on Data Generation).We aim to learn the solution operator G : Hðx,t = 0; l c ,y c Þ7 !ϕðx,tÞ which maps the initial strain-history function to the crack evolution.
In Fig. 2a, we show the mean-square error (MSE) between the studied models and ground truth.The left panel shows the MSE for the multi-layer autoencoder (MLAE) for different latent dimensions (d), where the violin plot shows the distribution of MSE from n = 5 independent trials.The right panel shows the resulting MSE for L-DeepONet operating on different latent dimensions (d) compared with the full high-dimensional DeepONet, FNO-2D, and FNO-3D.We observe that, regardless of the latent dimension, the L-DeepONet outperforms the standard DeepONet (Full DON) and performs comparably with FNO-2D and FNO-3D.In Fig. 3, a comparison between all models for a random representative result is shown.While L-DeepONet results in prediction fields almost identical to the reference, the predictions of the standard models deviate from the ground truth both inside and around the propagated crack.Finally, the cost of training the different models is presented in Table 1.Because the required network complexity is significantly reduced, the L-DeepONet is 1 − 2 orders of magnitude cheaper to train than the standard approaches.

Rayleigh-Bénard fluid flow convection
Rayleigh-Bénard convection occurs in a thin layer of fluid that is heated from below 47 .The natural fluid convection is buoyancy-driven and caused due to a temperature gradient ΔT.Instability in the fluid occurs when ΔT is large enough to make the non-dimensional Rayleigh number, Ra, exceed a certain threshold.The Rayleigh number whose physical interpretation is the ratio between the buoyancy and the viscous forces is defined as where α is the thermal expansion coefficient, g is the gravitational acceleration, h is the thickness of the fluid layer, ν is the kinematic viscosity and κ is the thermal diffusivity.When ΔT is small, the convective flow does not occur due to the stabilizing effects of viscous friction.Based on the governing conservation laws for an incompressible fluid (mass, momentum, energy) and the Boussinesq approximation according to which density perturbations affect only the gravitational force, the dimensional form of the Rayleigh-Bénard equations for a fluid defined on a domain Ω reads: where D/Dt denotes material derivative, u, p, T are the fluid velocity, pressure and temperature respectively, T 0 is the temperature at the lower plate, and x = (x, y) are the spatial coordinates.Considering two plates (upper and lower) the corresponding BCs and ICs are defined as where T 0 , and T 1 are the fixed temperatures of the lower and upper plates, respectively.For a 2D rectangular domain and through a nondimensionalization of the above equations, the fixed temperatures become T 0 = 0 and T 1 = 1.The IC of the temperature field is modeled as linearly distributed with the addition of a GRF, v(x) having correlation length scales ℓ x = 0.45, ℓ y = 0.4 simulated using a Karhunen-Loéve expansion.The objective is to approximate the operator G : Tðx,t = 0Þ7 !Tðx,tÞ (see the Supplementary Section on Data Generation).Figure 2b again shows violin plots of the MSE for the MLAE with differing latent dimensions and the MLE for the corresponding L-DeepONet compared with the other neural operators.Here we see that the reconstruction accuracy of the MLAE is improved by increasing the latent dimensionality up to d = 100.However, the change in the predictive accuracy of L-DeepONet for different values of d is less significant, indicating that latent spaces with even very small dimensions (d = 25) result in a very good performance.Furthermore, L-DeepONet outperforms all other neural operators with a particularly significant improvement compared to FNO.In Fig. 4, we observe that L-DeepONet is able to capture the complex dynamical features of the true model with high accuracy as the simulation evolves.In contrast, the standard DeepONet and FNO result in diminished performance as they tend to smooth out the complex features of the true temperature fields.Furthermore, the training time of the L-DeepONet is significantly lower than the full DeepONet and FNO as shown in Table 1.

Shallow-water equations
The shallow-water equations model the dynamics of large-scale atmospheric flows 48 .In a vector form, the viscous shallow-water  equations can be expressed as where Ω = (λ, ϕ) represents a spherical domain where λ, ϕ are the longitude and latitude respectively ranging from [ − π, π], V = iu + jv is the velocity vector tangent to the spherical surface (i and j are the unit vectors in the eastward and northward directions respectively and u, v the velocity components), and h is the height field which represents the thickness of the fluid layer.Moreover, f = 2Ξ sin ϕ is the Coriolis parameter, where Ξ is the Earth's angular velocity, g is the gravitational acceleration and ν is the diffusion coefficient.
As an initial condition, we consider a zonal flow which represents a typical mid-latitude tropospheric jet.The initial velocity component u is expressed as a function of the latitude ϕ as where u max is the maximum zonal velocity, ϕ 0 , and ϕ 1 represent the latitude in the southern and northern boundary of the jet in radians, respectively, and n = exp½À4=ðϕ 1 À ϕ 0 Þ 2 is a non-dimensional parameter that sets the value u max at the jet's mid-point.A small unbalanced perturbation is added to the height field to induce the development of barotropic instability.The localized Gaussian perturbation is described as h 0 ðλ,ϕ,t = 0Þ = ĥ cosðϕÞ exp½Àðλ=αÞ where − π < λ < π and ĥ,ϕ 2 ,α,β are parameters that control the location and shape of the perturbation.We consider α, β to be random variables with α ∼ U½0: 1,0:5 and β ∼ U½0:0 3,0:2 so that the input Gaussian perturbation is random.The localized perturbation is added to the initial height field, which forms the final initial condition h(λ, ϕ, t = 0) (see Supplementary Section on Data Generation).The objective is to approximate the operator G : hðλ,ϕ,t = 0Þ7 !uðλ,ϕ,tÞ.This problem is particularly challenging as the fine mesh required to capture the details of the convective flow both spatially and temporally results in output realizations having millions of dimensions.
Unlike the previous two applications, here the approximated operator learns to map the initial condition of one quantity, h(λ, ϕ, t = 0), to the evolution of a different quantity, u(λ, ϕ, t).Given the difference between the input and output quantities of interest (in scale and features), a single encoding of the combined data as in the standard proposed approach (see Fig. 1) is insufficient.Instead, two separate encodings are needed for the input and output data, respectively.While an autoencoder is used to reduce the dimensionality of the output data representing the longitudinal component of the velocity vector u, standard principal component analysis (PCA) is performed on the input data due to the small local variations in the initial random height field h which results in a small intrinsic dimensionality.
Results, in terms of MSE, are presented in Fig. 2c, where again we see that the L-DeepONet outperforms the standard approach while changes in the latent dimension do not result in significant differences in the model accuracy.Consistent with the results of the previous application, the training cost of the L-DeepONet is much lower than the full DeepONet (Table 1).We further note that training FNO for this problem (either FNO-2D or FNO-3D) proved computationally prohibitive.For a moderate 3D problem with spatial discretization beyond 64 3 , the latest GPU architectures such as the NVIDIA Ampere GPU do not provide sufficient memory to process a single training sample 49 .Data partitioning across multiple GPUs with distributed memory, model partitioning techniques like pipeline parallelism, and domain decomposition approaches 49 can be implemented to handle highdimensional tensors within the context of an automatic differentiation framework to compute the gradients/sensitivities of PDEs and thus optimize the network parameters.This advanced implementation is beyond the scope of this work as it proves unnecessary for the studied approach.Consequently, a comparison to the FNO is not shown here.A variant of FNO, the Spherical Fourier neural operator (S-FNO) has been tailored for problems on spherical domains such as shallow water equations 50 .However, S-FNO is primarily focused on mapping the initial condition to the solution at a final timestamp.In contrast, our objective is to learn the mapping from the initial condition to the solution time history, enabling the generation of a sequence of solutions at arbitrary time instants.Hence, we have not provided a comparison between L-DeepONet and S-FNO for this problem.Figure 5, shows the evolution of the L-DeepONet and the full DeepONet compared to the ground truth for a single realization.The L-DeepONet consistently captures the complex nonlinear dynamical features for all time steps, while the full model prediction degrades over time and again smoothing the results such that it fails to predict extreme velocity values for each time step that can be crucial, e.g., in weather forecasting.

Discussion
We have investigated latent DeepONet (L-DeepONet) for learning neural operators on latent spaces for time-dependent PDEs exhibiting highly non-linear features both spatially and temporally and resulting in high-dimensional observations.The L-DeepONet framework leverages autoencoder models to cleverly construct compact representations of the high-dimensional data while a neural operator is trained on the identified latent space for operator regression.Both the advantages and limitations of L-DeepONet are demonstrated in a collection of diverse PDE applications of increasing complexity and data dimensionality.As presented, L-DeepONet provides a powerful tool in SciML and UQ that improves the accuracy and generalizability of neural operators in applications where high-fidelity simulations are considered to exhibit complex dynamical features, e.g., in climate models.
A systematic comparison with standard DeepONet 11 and FNO 29 revealed that L-DeepONet improves the quality of results and it can capture with greater accuracy the evolution of the system represented by a time-dependent PDE.This result is more noticeable as the dimensionality and non-linearity of dynamical features increase (e.g., in complex convective fluid flows).Another advantage is that L-DeepONet training requires less computational resources, as standard DeepONet and FNO are trained on the full-dimensional data and are thus, more computationally demanding and require much larger memory (see Table 1).For all applications, we found that a small latent dimensionality (d ≤ 100) is sufficient for constructing powerful neural operators, by removing redundant features that can hinder the network optimization and thus its predictive accuracy.Furthermore, L-DeepONet can alleviate the computational demand and thus enable tasks that require the computation of kernel matrices, e.g., used in transfer learning for comparing the statistical distance between data distributions 24 .
Despite the advantages of learning operators in latent spaces, there are certain limitations that warrant discussion.L-DeepONet trains DR models to identify suitable latent representations for the combined input and output data.However, as shown in the final application, in cases where the approximated mapping involves heterogeneous quantities, two independent DR models need to be constructed.While in this work we found that simple MLAE models result in the smallest L-DeepONet predictive error, a preliminary study regarding the suitability of the DR approach needs to be performed for all quantities of interest.Another disadvantage is that the L-DeepONet as formulated is unable to interpolate in the spatial dimensions.The current L-DeepONet consists of a modified trunk net where the time component has been preserved while the spatial dimensions have been convolved.Thus, L-DeepONet can be used for a time but not for space interpolation/extrapolation.Finally, L-DeepONet cannot be readily employed in a physics-informed learning manner since the governing equations are not known in the latent space and therefore cannot be directly imposed.These limitations motivate future studies that continue to assist researchers in the process of constructing accurate and generalizable surrogate models for complex PDE problems prevalent in physics and engineering.
In the context of constructing accurate and generalizable surrogate models, the authors in 51 demonstrate that a slightly modified DeepONet training can achieve an order higher accuracy than its vanilla counterpart 11 .This improvement is realized through the adoption of a two-step training strategy, where the trunk network is initially trained, followed by sequential training of the branch network.The mechanism involves decomposing the entire complex non-convex training task into two subtasks, and the introduction of the Gram-Schmidt orthonormalization process with QR decomposition enhances the stability and generalization capabilities of the model.To demonstrate the effectiveness of the method proposed in 51 , we obtained the results for brittle fracture in a plate loaded in shear, shown in Table 2. Furthermore, our observations reveal that substituting QR decomposition with Singular Value Decomposition (SVD) contributes to an enhanced accuracy of the model.The results indicate that the modified training framework successfully mitigates overfitting issues.However, it is worth noting that, in its present form, the framework faces limitations in handling mini-batching of the training dataset.

Problem statement
Neural operators learn nonlinear mappings between infinite dimensional functional spaces on bounded domains and provide a unique simulation framework for real-time inference of complex parametric PDEs.Let Ω & R D be a bounded open set and X = XðΩ; R d x Þ and Y = YðΩ; R d y Þ two separable Banach spaces.Furthermore, assume that G : X !Y is a non-linear map arising from the solution of a time- where Θ is a finite-dimensional parameter space.In this standard setting, the optimal parameters θ * are learned through training the neural operator (e.g., via DeepONet, FNO) with a set of labeled observations fx j ,y j g N j = 1 generated on a discretized domain Ω m = {x 1 , …, x m } ⊂ Ω where fx j g m j = 1 represent the sensor locations, thus x jjΩ m 2 R D x and y jjΩ m 2 R D y where D x = d x × m and D y = d y × m.Representing the domain discretization with a single parameter m, corresponds to the simplistic case where mesh points are equispaced.However, the training data of neural operators are not restricted to equispaced meshes.For example, for a time-dependent PDE with two spatial and one temporal dimension with discretizations m s , m t respectively, the total output dimensionality is computed as

Approximating nonlinear operators on latent spaces via L-DeepONet
In physics and engineering, we often consider high-fidelity timedependent PDEs generating very high-dimensional input/output data with complex dynamical features.To address the issue of high dimensionality and improve the predictive accuracy we employ L-DeepONet which allows the training of DeepONet on latent spaces.The approach involves two main steps: (1) the nonlinear DR of both input and output data fx j ,y j g N j = 1 via a suitable and invertible DR technique, (2) learning of a DeepONet model on a latent space and inverse transformation of predicted samples back to the original space.This process is defined as where J θ encoder ,J θ decoder are the two parts of a DR method, r corresponds to data on the reduced space, G θ is the approximated latent operator and θ its trainable parameters.While the encoder J θ encoder is used to project high-dimensional data onto the latent space, the decoder J θ decoder is employed during the training of DeepONet to project predicted samples back to original space and evaluate its accuracy on the full-dimensional data fx j ,y j g N j = 1 .Once trained, L-DeepONet can be used for real-time inference at no cost.We note that the term 'L-DeepONet' refers to the trained DeepONet model together with the pre-trained encoder and decoder parts of the autoencoder which are required to perform inference in unseen samples (see Fig. 1).Next, the distinct parts of the L-DeepONet framework are elucidated in detail.

Learning latent representations
The first objective is to identify a latent representation for the highdimensional input/output PDE data.Compressing the data to a reduced representation will not only allow us to accelerate the DeepONet training but, as shown above, improves predictive performance and robustness.To this end, we employ autoencoders `due to their flexibility in the choice of the model architecture and the inherent inverse mapping capability.We note that the proposed framework allows for the adoption of any suitable linear or nonlinear DR method provided the existence of an inverse mapping.In this work, the objective is to demonstrate that DR enhances the accuracy of neural operators rather than establishing which DR method is the most advantageous.The latter depends on various factors including accuracy, generalizability, and computational cost.For our demonstrations, we apply AEs that we found to perform comparably or better than PCA across our diverse set of PDEs through systematic study (see Table 3 and Supplementary Fig. S7).However, the choice of DR approach can be problem and resource-dependent so, although AEs generally outperform PCA, PCA is found to be a viable approach for many problems and under certain conditions.
We train unsupervised autoencoder model J θ ae and perform hyperparameter tuning to identify the optimal latent dimensionality d, where d ≪ D x , D y .Assume a time-dependent PDE, where d x corresponds to the dimensionality of the input space and m s , m t the spatial and temporal discretizations of the generated data.In order to feed the autoencoder model with image-like data, the PDE outputs are reshaped into distinct snapshots, i.e., fŷ i g N × m t i = 1 .Finally, input and output data are concatenated into a single dataset fz i g . The two parts of the autoencoder model, which are trained concurrently, are expressed as where fx r i g Results for both the maximum and minimum d values tested for each application are provided.To evaluate the performance of L-DeepONet, we compute the mean square error of predictions, and we report the mean and standard deviation of this metric based on five independent training trials.and θ decoder respectively.The optimal set the autoencoder parameters θ ae = {θ encoder , θ decoder } are obtained via the minimization of the loss function where ∥ ⋅ ∥ 2 denotes the standard Euclidean norm and z fx,ỹg denotes the reconstructed dataset of combined input and output data.From a preliminary study, which is not shown here for the sake of brevity, we investigated three AE models, simple autoencoders (vanilla-AE) with a single hidden layer, multi-layer autoencoders (MLAE), with multiple hidden layers and convolutional autoencoders (CAE) which convolve data through convolutional layers.We found that MLAE performs best, even with a small number of hidden layers (e.g., 3).Furthermore, the use of alternative AE models which are primarily used as generative models, such as variational autoencoders (VAE) 52 or Wasserstein autoencoders (WAE) 53 , resulted in significantly worse L-DeepONet performance.Although such models resulted in good reconstruction accuracy and thus can be used to reduce the data dimensionality and generate synthetic yet realistic samples, we found that the obtained submanifold is not well-suited for training the neural operator, as it may result in the reduction of data variability or even representation collapse.
Training neural operator on latent space (L-DeepONet) Once the autoencoder model is trained and the reduced data {x r , y r } are generated, we aim to approximate the latent representation mapping with an unstacked DeepONet G θ , where θ are the trainable model parameters.As shown in Fig. 1, the unstacked DeepONet consists of two concurrent DNNs, a branch net which encodes the inputs realizations x r 2 R d (in this case the reduced input data) evaluated at the reduced spatial locations {x 1 , x 2 , …, x d }.On the other hand, the trunk net takes as input the temporal coordinates ζ = ft i g m t i = 1 at which the PDE output is evaluated.The solution operator for an input realization, x 1 , can be expressed as: where ½b 1 ,b 2 , . . .,b p T is the output vector of the branch net, ½tr 1 ,tr 2 , . . .,tr p T the output vector of the trunk net and p denotes a hyperparameter that controls the size of the final hidden layer of both the branch and trunk net.The trainable parameters of the DeepONet, represented by θ in Eq. ( 14), are obtained by minimizing a loss function, which is expressed as: where L r ðθÞ, L i ðθÞ denote the residual loss and the initial condition loss respectively, y r the reference reduced outputs and ỹr the predicted reduced outputs.In this work, we only consider the standard regression loss L r ðθÞ, however, additional loss terms can be added to the loss function.The branch and trunk networks can be modeled with any specific architecture.Here we consider a CNN for the branch net architecture and a feed-forward neural network (FNN) for the trunk net to take advantage of the low dimensions of the evaluation points, ζ.To feed the branch net of L-DeepONet the reduced output data are reshaped to R ffiffi ffi , thus it is advised to choose square latent dimensionality values.Once the optimal parameters θ are obtained, the trained model can be used to predict the reduced output for novel realizations of the input x 2 R D x .Finally, the predicted data are used as inputs to the pre-trained decoder J θ decoder , to transform results back to the original space and obtain the approximated full-dimensional output y rec 2 R D y .We note that the training cost of L-DeepONet is significantly lower compared to the standard model, due to the smaller size of the network and the reduced total number of its trainable parameters (see Table 4).

Error metric
To assess the performance of L-DeepONet we consider the MSE evaluated on a set of N test test realizations where y 2 R D y is the reference and y rec 2 R D y the predicted output respectively.More details on how this framework is implemented for different PDE systems of varying complexity can be found in the Results section.Information regarding the choice of neural network architectures and generation of training data are provided in the Supplementary Tables S2 and S3 as well as Supplementary Section of Data Generation.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Fig. 1 |
Fig.1| Latent DeepONet (L-DeepONet) framework for learning deep neural operators on latent spaces.In the first step, a multi-layer autoencoder is trained using a combined dataset of the high-dimensional input and output realizations of a PDE model, fx i ,y i g N i = 1 , respectively.The trained encoder projects the data onto a latent space R d and the dataset on the latent space, fx r i ,y r i g N i = 1 is then used to train a

Fig. 2 |
Fig.2| Results of all applications.Left: Results for the multi-layer autoencoders (MLAE) for different values of the latent dimensionality.Right: Results for all the studied neural operators.For all panes, violin plots are generated from 5 independent trainings of the models using different random seed numbers.

Fig. 3 |
Fig. 3 | Brittle fracture in a plate loaded in shear: results of a representative sample with y c = 0.55 and l c = 0.6 for all neural operators.The results of the L-DeepONet model consider the latent dimension, d = 64.The neural operator is

Fig. 4 |
Fig. 4 | Rayleigh-Bénard convective flow: results of the temperature field of a representative sample for all neural operators.The results of the L-DeepONet model consider the latent dimension, d = 100.The neural operator is trained to approximate the growth of the evolution of the temperature field from a realization of the initial temperature field for seven time steps.

Fig. 5 |
Fig. 5 | Shallow water equations: results of the evolution of the velocity field through eight time steps for all the operator models considered in this work, for a representative realization of the initial perturbation to the height field.The results of the L-DeepONet model consider the latent dimension, d = 81.

Table 1 |
Comparison of the computational training time in seconds (s) for all the neural operators across all considered applications, identically trained on an NVIDIA A6000 GPU

Table 2 |
Comparative computational accuracy of vanilla DeepONet training and modified DeepONet training 51 for brittle fracture in a plate loaded in shear

Table 4 |
Comparison of the number of trainable parameters for all the neural operators across all considered applications